Recent research has revealed that the output of Deep Neural Networks (DNN) can be easily altered by adding relatively small perturbations to the input vector. In this paper, we analyze an attack in an extremely limited scenario where only one pixel can be modified. For that we propose a novel method for generating one-pixel adversarial perturbations based on differential evolution (DE). It requires less adversarial information (a blackbox attack) and can fool more types of networks due to the inherent features of DE. The results show that 67.97% of the natural images in Kaggle CIFAR-10 test dataset and 16.04% of the ImageNet (ILSVRC 2012) test images can be perturbed to at least one target class by modifying just one pixel with 74.03% and 22.91% confidence on average. We also show the same vulnerability on the original CIFAR-10 dataset. Thus, the proposed attack explores a different take on adversarial machine learning in an extreme limited scenario, showing that current DNNs are also vulnerable to such low dimension attacks. Besides, we also illustrate an important application of DE (or broadly speaking, evolutionary computation) in the domain of adversarial machine learning: creating tools that can effectively generate lowcost adversarial attacks against neural networks for evaluating robustness.
translated by 谷歌翻译
Transparency of Machine Learning models used for decision support in various industries becomes essential for ensuring their ethical use. To that end, feature attribution methods such as SHAP (SHapley Additive exPlanations) are widely used to explain the predictions of black-box machine learning models to customers and developers. However, a parallel trend has been to train machine learning models in collaboration with other data holders without accessing their data. Such models, trained over horizontally or vertically partitioned data, present a challenge for explainable AI because the explaining party may have a biased view of background data or a partial view of the feature space. As a result, explanations obtained from different participants of distributed machine learning might not be consistent with one another, undermining trust in the product. This paper presents an Explainable Data Collaboration Framework based on a model-agnostic additive feature attribution algorithm (KernelSHAP) and Data Collaboration method of privacy-preserving distributed machine learning. In particular, we present three algorithms for different scenarios of explainability in Data Collaboration and verify their consistency with experiments on open-access datasets. Our results demonstrated a significant (by at least a factor of 1.75) decrease in feature attribution discrepancies among the users of distributed machine learning.
translated by 谷歌翻译
我们将知识驱动的程序合成(KDP)作为程序综合任务的变体进行了介绍,该任务需要代理来解决一系列程序合成问题。在KDP中,代理应使用早期问题中的知识来解决后期问题。我们提出了一种基于PushGP的新方法来解决KDPS问题,该问题将子程序作为知识。所提出的方法通过偶数分区(EP)方法从先前解决的问题的解中提取子程序,并使用这些子程序使用自适应替换突变(ARM)来解决即将到来的编程任务。我们称此方法PushGP+EP+ARM。使用PushGP+EP+ARM,在知识提取和利用过程中不需要人类的努力。我们将提出的方法与PushGP进行比较,以及使用人手动提取的子程序的方法。与PushGP相比,我们的PushGP+EP+ARM可以实现更好的火车错误,成功计数和更快的收敛速度。此外,当连续解决六个程序合成问题的序列时,我们证明了PushGP+EP+组的优势。
translated by 谷歌翻译
多源数据融合,共同分析了多个数据源以获得改进的信息,引起了广泛的研究关注。对于多个医疗机构的数据集,数据机密性和跨机构沟通至关重要。在这种情况下,数据协作(DC)分析通过共享维数减少的中间表示,而无需迭代跨机构通信可能是合适的。在分析包括个人信息在内的数据时,共享数据的可识别性至关重要。在这项研究中,研究了DC分析的可识别性。结果表明,共享的中间表示很容易识别为原始数据以进行监督学习。然后,这项研究提出了一个非可读性可识别的直流分析,仅共享多个医疗数据集(包括个人信息)的非可读数据。所提出的方法基于随机样本排列,可解释的直流分析的概念以及无法重建的功能的使用来解决可识别性问题。在医学数据集的数值实验中,提出的方法表现出非可读性可识别性,同时保持了常规DC分析的高识别性能。对于医院的数据集,提出的方法在仅使用本地数据集的本地分析的识别性能方面表现出了9个百分点的改善。
translated by 谷歌翻译
最近,已经开发了数据协作(DC)分析,以跨多个机构跨多个机构提供隐私的综合分析。 DC分析集中了单独构建的维度减少中间表示形式,并通过协作表示实现集成分析,而无需共享原始数据。为了构建协作表示形式,每个机构都会生成并共享一个可共享的锚数据集并集中其中间表示。尽管随机锚数据集对DC分析的功能很好,但使用其分布与RAW数据集的分布接近的锚数据集有望改善识别性能,尤其是对于可解释的DC分析。基于合成少数群体过度采样技术(SMOTE)的扩展,本研究提出了一种锚数据构建技术,以提高识别性能,而不会增加数据泄漏的风险。数值结果证明了所提出的基于SMOTE方法的效率比人工和现实世界数据集的现有锚数据构建体的效率。具体而言,所提出的方法在收入数据集的现有方法上分别实现了9个百分点和38个百分点的性能改进。提出的方法提供了SMOTE的另一种用途,而不是用于不平衡的数据分类,而是用于隐私保护集成分析的关键技术。
translated by 谷歌翻译
近年来,通过分布式数据的隐私保存的因果推断技术的开发引起了人们的关注。为了解决这个问题,我们提出了基于数据协作(DC-QE)的准实验,该实验可以从具有隐私保护的分布式数据中获得因果推断。我们的方法通过仅共享降低维度的中间表示来保留私人数据的隐私,这些中间表示由各方单独构建。此外,我们的方法可以减少随机错误和偏见,而现有方法只能减少治疗效果估计中的随机错误。通过对人工和现实世界数据的数值实验,我们确认我们的方法可以比单个分析得出更好的估计结果。随着我们方法的传播,可以将中间表示形式作为开放数据发布,以帮助研究人员找到因果关系并积累为知识库。
translated by 谷歌翻译
大量量化在线用户活动数据,例如每周网络搜索量,这些数据与几个查询和位置的相互影响共同进化,是一个重要的社交传感器。通过从此类数据中发现潜在的相互作用,即每个查询之间的生态系统和每个区域之间的影响流,可以准确预测未来的活动。但是,就数据数量和涵盖动力学的复杂模式而言,这是一个困难的问题。为了解决这个问题,我们提出了FluxCube,这是一种有效的采矿方法,可预测大量共同发展的在线用户活动并提供良好的解释性。我们的模型是两个数学模型的组合的扩展:一个反应扩散系统为建模局部群体之间的影响流和生态系统建模的框架提供了一个模拟每个查询之间的潜在相互作用。同样,通过利用物理知识的神经网络的概念,FluxCube可以共同获得从参数和高预测性能获得的高解释性。在实际数据集上进行的广泛实验表明,从预测准确性方面,FluxCube优于可比较的模型,而FluxCube中的每个组件都会有助于增强性能。然后,我们展示了一些案例研究,即FluxCube可以在查询和区域组之间提取有用的潜在相互作用。
translated by 谷歌翻译
我们考虑了最小化客观功能的优化问题,该问题允许变异形式,并根据\ textIt {约束域}上的概率分布定义,这对理论分析和算法设计构成了挑战。受镜下降算法的启发,我们提出了一种迭代和基于粒子的算法,称为镜像变异传输(\ textbf {mirriryvt})。对于每次迭代,\ textbf {mirrirvt}将粒子映射到由镜像映射引起的无约束的双空间,然后大约在通过推动粒子来定义的分布的歧管上大致执行wasserstein梯度下降。在迭代结束时,将粒子映射回原始的约束空间。通过模拟实验,我们证明了\ textbf {mirrirvt}的有效性,可以最大程度地限制函数,而不是单纯形和欧几里得球受到的域上的概率分布。我们还分析了其理论特性,并将其融合到目标功能的全局最小值。
translated by 谷歌翻译
This article presents our generative model for rhythm action games together with applications in business operations. Rhythm action games are video games in which the player is challenged to issue commands at the right timings during a music session. The timings are rendered in the chart, which consists of visual symbols, called notes, flying through the screen. We introduce our deep generative model, Gen\'eLive!, which outperforms the state-of-the-art model by taking into account musical structures through beats and temporal scales. Thanks to its favorable performance, Gen\'eLive! was put into operation at KLab Inc., a Japan-based video game developer, and reduced the business cost of chart generation by as much as half. The application target included the phenomenal "Love Live!," which has more than 10 million users across Asia and beyond, and is one of the few rhythm action franchises that has led the online era of the genre. In this article, we evaluate the generative performance of Gen\'eLive! using production datasets at KLab as well as open datasets for reproducibility, while the model continues to operate in their business. Our code and the model, tuned and trained using a supercomputer, are publicly available.
translated by 谷歌翻译